Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Document Image De-warping Based on Detection of Distorted Text Lines

Identifieur interne : 001312 ( Main/Exploration ); précédent : 001311; suivant : 001313

Document Image De-warping Based on Detection of Distorted Text Lines

Auteurs : Lothar Mischke [Allemagne] ; Wolfram Luther [Allemagne]

Source :

RBID : ISTEX:093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1

Descripteurs français

English descriptors

Abstract

Abstract: Image warping caused by scanning, photocopying or photographing a document is a common problem in the .eld of document processing and understanding. Distortion within the text documents impairs OCRability and thus strongly decreases the usability of the results. This is one of the major obstacles for automating the process of digitizing printed documents. In this paper we present a novel algorithm which is able to correct document image warping based on the detection of distorted text lines. The proposed solution is used in a recent project of digitizing old, poor quality manuscripts. The algorithm is compared to other published approaches. Experiments with various document samples and the resulting improvements of the text recognition rate achieved by a commercial OCR engine are also presented.

Url:
DOI: 10.1007/11553595_131


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Document Image De-warping Based on Detection of Distorted Text Lines</title>
<author>
<name sortKey="Mischke, Lothar" sort="Mischke, Lothar" uniqKey="Mischke L" first="Lothar" last="Mischke">Lothar Mischke</name>
</author>
<author>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11553595_131</idno>
<idno type="url">https://api.istex.fr/document/093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000533</idno>
<idno type="wicri:Area/Istex/Curation">000526</idno>
<idno type="wicri:Area/Istex/Checkpoint">000C21</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Mischke L:document:image:de</idno>
<idno type="wicri:Area/Main/Merge">001348</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:05-0420709</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000442</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000345</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000414</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Mischke L:document:image:de</idno>
<idno type="wicri:Area/Main/Merge">001444</idno>
<idno type="wicri:Area/Main/Curation">001312</idno>
<idno type="wicri:Area/Main/Exploration">001312</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Document Image De-warping Based on Detection of Distorted Text Lines</title>
<author>
<name sortKey="Mischke, Lothar" sort="Mischke, Lothar" uniqKey="Mischke L" first="Lothar" last="Mischke">Lothar Mischke</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Eduard Spranger Vocational School, Vorheider Weg 8, D-59067, Hamm</wicri:regionArea>
<wicri:noRegion>59067, Hamm</wicri:noRegion>
<wicri:noRegion>Hamm</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute of Computer Science and Interactive Systems, University of Duisburg–Essen, Lotharstr. 65, D-47048, Duisburg</wicri:regionArea>
<wicri:noRegion>47048, Duisburg</wicri:noRegion>
<wicri:noRegion>Duisburg</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Allemagne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1</idno>
<idno type="DOI">10.1007/11553595_131</idno>
<idno type="ChapterID">131</idno>
<idno type="ChapterID">Chap131</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Digitizing</term>
<term>Document processing</term>
<term>Image interpretation</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Printed character</term>
<term>Printed document</term>
<term>Text</term>
<term>Usability</term>
<term>Warping</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Caractère imprimé</term>
<term>Document imprimé</term>
<term>Gauchissement</term>
<term>Interprétation image</term>
<term>Numérisation</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Texte</term>
<term>Traitement document</term>
<term>Utilisabilité</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Numérisation</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Image warping caused by scanning, photocopying or photographing a document is a common problem in the .eld of document processing and understanding. Distortion within the text documents impairs OCRability and thus strongly decreases the usability of the results. This is one of the major obstacles for automating the process of digitizing printed documents. In this paper we present a novel algorithm which is able to correct document image warping based on the detection of distorted text lines. The proposed solution is used in a recent project of digitizing old, poor quality manuscripts. The algorithm is compared to other published approaches. Experiments with various document samples and the resulting improvements of the text recognition rate achieved by a commercial OCR engine are also presented.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
</list>
<tree>
<country name="Allemagne">
<noRegion>
<name sortKey="Mischke, Lothar" sort="Mischke, Lothar" uniqKey="Mischke L" first="Lothar" last="Mischke">Lothar Mischke</name>
</noRegion>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<name sortKey="Mischke, Lothar" sort="Mischke, Lothar" uniqKey="Mischke L" first="Lothar" last="Mischke">Lothar Mischke</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001312 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001312 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1
   |texte=   Document Image De-warping Based on Detection of Distorted Text Lines
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024